perm filename CHAOS.NET[DLN,MRC]1 blob
sn#298537 filedate 1977-08-10 generic text, type C, neo UTF8
COMMENT ā VALID 00017 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 CHAOS ORDER 1/30/77 dam
C00003 00003 >Goals
C00006 00004 >User appearance
C00014 00005 >Network Control Program
C00018 00006 >>Host numbers
C00021 00007 >>Indices
C00024 00008 >>Operations
C00035 00009 >>Flow and Error Control
C00054 00010 >>Media handlers
C00058 00011 >>Buffers
C00063 00012 >>ITS System Calls
C00076 00013 >Comparison with LCSnet
C00092 00014 >Transmission Media
C00104 00015 >>DL10 & DTE20
C00105 00016 >>Asynchronous line
C00107 00017 >Higher-Level Protocols
C00108 ENDMK
Cā;
CHAOS ORDER 1/30/77 dam
**** DRAFT ****
Goals
Non-goals
Hardware Assumptions and Constraints
User appearance
Connections
Contact Names
ITS implementation
Lisp Machine implementation
Network Control Program
Packets
Host numbers
Packet numbers
Indices
Media Handlers
Flow Control
Operations
Buffers
ITS System Calls
Comparison with LCSNET
Transmission Media
Ethernet
TEN11 Interface
DL10 & DTE20
Asynchronous line
Higher-Level Protocols
Telnet
File Access
Mail
Locate-named-service
>Goals
High speed communication between processes running in various local machines.
By "high speed", I mean much faster than the Arpanet.
No undetected errors in data transmission.
Not to depend on a particular medium. (However, we are compromising
by picking a fixed packet size. The simplicity and efficiency are worth it.)
Simple enough to put in small pdp11's. Also, simple to the user.
As much power as the Arpanet but, hopefully, a lot less hair.
Work well for both "telnet" and "file transfer."
The initial implementation in ITS should have the "in-system" part as
small and simple as possible.
>Non-goals
Byte sizes other than 8 bits. (pdp10 binary transmission should
be part of a user-level file-transfer/ML-device protocol.)
Compatibility with the Arpanet.
Substituting for TEN11 interface functions such as running
the AI TV11 and XGP.
>Hardware Assumptions and Constraints
Transmission is physically in "packets" which have headers, rather
than in, e.g., continuous streams.
The prototype caiosnet (ether) interface limits the physical length
of a packet to 1025 bits including overhead bits. The net result
is the maximum number of data bytes in any packet is 104. This limitation
will be extended to the whole network (to keep things simple).
All transmission media will be assumed to be highly-reliable
but not perfect; "perfect" reliability will be assured by having
the two ends of a connection use an acknowledgement protocol
which detects lost messages. Transmission media are required
to lose any messages that they don't deliver intact. (I.e. there
must be hardware checksums.)
The acknowledgement protocol must be designed not to limit
performance.
Statistical flow control ((see below.))
>User appearance
The network allows user processes in various machines to
communicate with each other in various ways, for instance,
in imitation of a terminal, or in imitation of a disk file
system. These facilities are built on top of the basic
capability to send "packets" (a header plus some data in the
form of 8-bit bytes) through the network. The network undertakes
never to lose or garble any packets, except when the connection
is cut off entirely.
This document defines the low-level, "in-system" part of the
protocol. On top of this, special programs (running in user-mode)
will implement the higher-level protocol that the general user
program sees. These protocols and programs won't be discussed
further in this document, but remember that the strange packet
formats and so forth are not seen by most user programs.
>>Connections
When two processes wish to communicate, they establish a
connection between them. This connection allows two streams
of packets to flow, one in each direction. [Explain why
connections should be bi-directional rather than uni-directional.
Basically that's what you always want, and it makes things simpler.]
Connections are essentially the only facility provided by the network.
However, when first establishing the connection it is necessary
for the two processes to contact each other, and make each
other known to their respective operating systems. In addition,
it is often the case (in the usual user-server situation) that
one of the processes does not exist beforehand, but is to be created
and made to run a specified program.
>>Contact Names
The way we choose to implement contacting is to say that one process
is always a "user" and one process is always a "server". The server
has some "contact name" to which it "listens". The user requests its
operating system to connect it to a specified contact name at a
specified host. If a process at that host is listening to that
contact name, the two are connected. If no one is listening to that
contact name, the operating system must create a server process
which will load itself with the appropriate program and connect up.
Discovering which host to connect to to obtain a given service
is an issue for higher-level protocols. It will not be dealt
with at all initially (that is, there will be a table of host
names and numbers and the user will have to enter the name.)
Once the connection has been established, there is no more need for
the contact name, and it is discarded. Indeed, often the contact name
is simply the name of a network protocol (such as "telnet") and several
users may want to have connections to that service at the same time,
so contact names must be "reusable." (In the other common case, the
contact name will be a "gensym".)
As far as the operating systems involved are concerned, contact names
are simply arbitrary ascii strings defined by user programs. It is
expected that the various higher-level protocols will define standard
contact names; for instance, to get the telnet protocol one would
connect to "telnet server"; to get the file transfer protocol one
would connect to "file transfer server". If a machine receives a
request to connect to a contact name which no one is currently listening
to, a server process must be created and made to execute a program
which decides, from the contact name, what server program to load
and execute, or else to refuse the request for connection.
Contact names have no relation to file names; they are simply
a device for introducing two processes to each other. If one was
using the network to transfer a file, one would first contact
the file transfer server at the appropriate host, then send a
packet containing the name of the file to be accessed.
>>ITS system calls
Ordinary user programs will not access the network directly; they will
go indirectly through a job-device or sty-type program which will
use a higher-level protocol to make the network look like what the
user wants, the traditional things being a terminal and a disk
file system.
Since these intermediate user-mode programs for using the network will
exist, there is no reason for the interface to the low level network
provided by the system to look at all like a standard device. Instead,
it will be designed solely for simplicity and ease of implementation,
and for a certain degree of efficiency. This interface will be
described after the interface between Network Control Programs in
different machines (the low-level protocol) is described.
At some future time the intermediate programs might get moved into the
system for reasons of efficiency, but that should not be allowed to
complicate the initial implementation.
>>Lisp Machine implementation
In the case of the Lisp Machine, the only distinction between user
programs and system programs is who maintains and documents them,
and how carefully. The code will be modularized in about the same
way as in the ITS implementation.
>Network Control Program
This is the part of the operating system(s) that implements the network
(obviously).
>>Packets
The NCP's operate by exchanging packets. A packet consists of a
header containing control information, and zero or more 8-bit bytes of
data. Hardware restrictions of the prototype CAIOS net interface
restrict the maximum length of a packet to 61 16-bit words. In fact,
we will limit it to 60 words (to make packet buffers in pdp10's be 32
words including two overhead words). Again for the convenience of
pdp10's, the header should be an even number of 16-bit words.
In this section the packets will be described as they look to a pdp11.
They look the same inside a Lisp Machine, since the byte structure is the
same. Inside a pdp10, packets are stored with two 16-bit words
left-adjusted in each pdp10 word. Additionally, the bytes in the data
portion of the packet are swapped so as to put them in pdp10 standard
order. pdp11's that communicate (directly) with pdp10's will be required
to do this byte swapping since they're likely to have more time available
than the 10 to do it in, and can also do it faster, having a special
instruction for it. pdp10's that communicate directly to the network will
have hardware assistance for byte reshuffling in their interfaces. See the
transmission media section for how packets are encapsulated during
transmission through the various media.
The header is 8 16-bit words and contains the following fields:
-------------------
| opcode | nbytes |
-------------------
| <not used> |
-------------------
|destination host#|
-------------------
|destination index|
-------------------
| source host # |
-------------------
| source index |
-------------------
| packet # |
-------------------
| ack packet # |
-------------------
opcode - tells the receiver of the packet how to interpret
it. See the Operations section below.
nbytes - the number of 8-bit bytes of data in the data part.
The maximum value of nbytes is 104. The minimum is 0.
destination host #
destination index - index for this connection assigned by the
destination host's NCP.
source host #
source index - index for this connection assigned by the
source host's NCP.
packet # - an ascending reference number used in error and
flow control (see below).
ack packet # - used in error and flow control (see below.)
>>Host numbers
Each machine has an identifying number. Since these are totally
arbitrary, I will assign some right now to avoid any confusion.
0 not used.
1 AI
2 ML
3 DM
4 MC
5 Micro automation
6 LOGO
7 Plasma physics
10 Chess machine
11 Lisp machine #1
12 Lisp machine #2
Note that these are not the same as the host numbers used by the
physical chaosnet hardware.
>>Packet numbers
Each time the sending user puts another packet into the network, this
number is increased by one. (These numbers are independent for the
two directions of a connection.) The receiver uses these numbers to
get the packets into order and ensure that there are no duplications
nor omissions. The packet numbers are 16 bits and wrap around to zero
when they overflow. When the connection is first opened, an initial
value for the packet# is established. If it was 0, then the packet#
of the first data packet would be 1.
The receiver returns packet numbers to the sender in the "ack packet #"
field of packets going in the opposite direction on the same connection.
The number in this field is the number of the latest packet successfully
received by the receiver. The sender need not make any further attempt
to send packets numbered less or equal to this. More on this below.
Packet #'s should be compared modulo 2**16. On pdp11's, use
CMP A,B
BMI <A is less> (BMI rather than BLT or BLO)
On pdp10's, use
SUB A,B
TRNE A,100000 (rather than CAMGE A,B)
JRST <A is less>
>>Indices
Each connection has two indices assigned to it, one at each end. Each
index is an arbitrary 16-bit number assigned by the NCP at its end; usually
it is an index into that NCP's tables. Indices are required to be
non-zero. For maximum simplicity, all packets include both indices. The
receiver of a packet uses the destination index to find out who to give the
packet to. Generally the source index is used only for error checking, but
when a connection is first opened the source index has to be saved and used
as the destination index in future packets in the reverse direction.
A user process's "capability" or "channel" to a connection, used by it
to ask the NCP to operate on that connection, simply contains the
appropriate index.
Associated with each index the NCP has a "state", the host # and index
# of the other end of the connection, some read buffers and associated
variables, including a current packet #, and some write buffers and
associated variables, again including a current packet #.
The "state" can be Closed (no connection or other activity currently
associated with this index), Open (this index has a connection to
another index at another machine), RFC-sent (requested another machine
for a connection, but no answer yet), Listen (listening for a request
for connection to a certain contact name), Broken (connection closed
abnormally by network or machine lossage), and RFC-received (waiting
for a server process to get going and pick up a request for connection
that came in).
>>Operations
This section tells what the values of the opcode field in a packet are, and
how an NCP responds to each one.
1 RFC - Request for Connection
This message is sent from user to server in order to open a
connection. The data contains the contact name. The destination
index is zero, because it is not known yet. The responses are: OPN,
if a server process is found or created that wishes to accept the
request and open up a connection; CLS, if the connection cannot be
opened. There may also be no response, if the RFC was lost in the
network, or the destination host is down but not known to be down, or
the reply was lost in the network. The network software guarantees
reliable transmission once a connection has been established, but not
of the control messages that initially establish a connection, so
appropriate time-outs must exist (perhaps only in the user process
that asks the NCP to issue an RFC).
The packet # field contains the first packet # that will be assigned
to data transmitted from the user process, minus one modulo 2**16. In
the simplest case, this can be zero, and the first packet sent will be
packet # 1. One might also imagine uniquizing the packet numbers
as an extra error check, but this should not be necessary.
The ack packet # field contains the initial window size for packets
sent to the sender of the RFC (see below). This field is used because
there can be no acknowledgement going on, since there is no connection yet.
2 OPN - Connection Open
This is the positive acknowledgement to RFC. The source index field
conveys the acknowledger's connection index to the requester. The packet #
field contains the first packet # that will be assigned to data transmitted
from the server process, minus one modulo 2**16.
The ack packet # field contains the initial window size for packets
sent to the sender of the OPN (see below). This field is used because
there can be no acknowledgement going on, since there is no connection yet.
3 CLS - Connection Closed
CLS is the negative response to RFC. It indicates that no server was
listening to the contact name, and one couldn't be created, or for
some reason the server didn't feel like accepting this request for a
connection, or the destination NCP was unable to complete the
connection (e.g. connection table full.) The destination index will
be the source index of the RFC. The source index will be zero because
the NCP did not put this connection into its tables. The data bytes,
if there are any, contain an ascii explanation.
CLS is also used to close a connection after it has been open for a while.
In the Arpanet, the NCP undertakes not to close the connection when the
user requests it, but waits until all data transfer has completed. This is
a source of extra complexity, since data transfer may be hung up, there
have to be timeouts, there have to be connections waiting to be closed
which aren't owned by any user, etc. It seems simpler to make CLS take
effect immediately, and let the user processes assure that data transfer
has been completed. Note that telnet-like applications don't need it, and
ftp-like applications have to have it separately from closing anyway.
200-377 DATA - Transmits Data
The data portion of the packet is data being sent through the connection.
The packet # is a number that increments by one for each data packet sent
in this direction on this connection. This is used to detect lost packets
(which includes packets garbled in transmission and packets lost in the
statistical flow control scheme) and duplicated packets (caused by lost or
delayed acknowledges. The NCP undertakes to deliver the packets to the
destination process in the same order that they came from the source
process, with no duplications and no omissions. Note that any opcode with
the sign bit on is a data packet as far as the NCP is concerned; if they
wish, higher-level protocols may use the opcode field to define various
different kinds of data packets. Thus, what is herein called a data packet
may be a "control" packet to a higher-level protocol.
4 NOP - no-operation
This type of packet does nothing special to itself. However, like all
packets it contains an "acknowledge packet number" field. The main use for
NOP packets, then, is as a vehicle to convey acknowledgements. The packet#
field of NOP is not used (i.e. NOPs are not themselves acknowledged.)
5 WIN - set window size
The byte count is always 2. The two data bytes, when combined into a
16-bit word in the pdp11's way (data2*400+data1), give the window size
desired for packets sent through this connection to the sender of this
message.
The window size is the maximum number of outstanding unacknowledged data
packets which the sender may have at any one time. If the sending user
process tries to transmit additional data packets on this connection,
it should be made to wait until some packets have been acknowledged.
The intention of the window size is to regulate how often "acknowledges"
must be returned. It's the same as in DSP and TCP.
[[Explain more?]]
No sender is actually required to pay any attention to the window
size. No receiver is actually required to set the window size to
something reasonable. However, those hosts that want to maximize
performance should do something about the window size. The size
is initially set during the RFC/OPN dialogue, presumably according
to the type of protocol being used. An NCP may, if it chooses, use
the WIN packet to dynamically adjust the window size according to
observed network behavior. Maybe.
6 LOS - you are losing
If a host receives a packet for a connection that does not exist (other
than RFC which isn't associated with a particular connection, and CLS which
is safe to ignore), it should interchange source and destination, change
the opcode to LOS, and send the packet back. A host receiving a LOS should
break the connection specified by the destination index and inform the
associated process that something has gone wrong.
This isn't actually necessary, since if a packet is re-transmitted many
times without being acknowledged, the NCP should give up, close the
connection, and inform the user that the foreign host appears to be dead.
7 LSN - listen (never transmitted through the net, see below)
>>Flow and Error Control
The NCPs conspire to ensure that data packets are sent from user
to user with no duplications, omissions, or changes of order.
Each receiver (each end of each connection is a receiver, and also a
sender; think of receivers and senders as little actors inside the NCP) has
a list of buffers containing packets which have been successfully received
and are waiting to be read by the user process, and two packet# variables.
One is the number of the last packet successfully received from the
network. The other is the number of the last packet which has been
acknowledged. If these two are not equal, the receiver needs to send an
acknowledgement "soon."
Acknowledgements are sent by putting the last-received variable into the
"ack packet #" field of an outgoing packet on the opposite direction of the
appropriate connection, and copying the last-received variable into the
last-acknowledged variable. Where does the outgoing packet come from?
There are two styles of connections, and it should be the responsibility of
the user process to tell the NCP which style of connection this is. The
first is a mostly unidirectional connection. In this case, there isn't
going to be an outgoing packet, and the NCP should generate a NOP packet to
carry the acknowledgement.
The second case is an interactive, or command-response connection. This
could be a telnet connection (at the server end), or a connection between
two programs where one sends a command and the other almost immediately
sends a response. In this case there is likely to be an outgoing packet on
which the acknowledgement can be piggy-backed. Either a user process which
declares this type of connection should be required always to send a packet
to carry the acknowledgement, or the NCP should have a time-out after which
it will generate a NOP to carry the acknowledgement. This scheme is
intended to substantially decrease the total number of packets sent through
the network.
When a receiver receives a data packet, it compares the packet # of that
packet with the last-received variable. If it is one more, this packet is
given to the user and everything is normal. The last-received variable is
incremented. An acknowledgement will soon be sent.
If it is higher than 1+ last-received, a packet was lost and will have to
be retransmitted. The receiver could save this packet until the lost one
arrives, then send them both to the user in the correct order, but it is
all right to simply discard the packet, because when the sender retransmits
the lost packet it will also retransmit all packets after the lost packet,
so very little would be gained by saving this packet. Of course, this
assumes that if a host sends out two packets, they generally do not
arrive in reverse order. On a local network such as this one, with only
a single path between any pair of hosts, that is probably a valid
assumption, so we needn't bother with the extra complexity in the NCP's
to save such packets. Nothing in the protocol prevents it from being
put into the NCP's later if it proves to be needed.
If the packet # received is less or equal to the last-received variable, it
is a duplicate. The most likely cause of duplication is that an
acknowledge was lost. Therefore, the last-acknowledged variable should be
set to one less than the last-received variable, so that another
acknowledge will be sent. Then the packet should be discarded.
It is always okay for the receiver to limit the amount of buffer space
available to any one connection, and to discard any incoming packet for which
it can't find buffer space. If this happens a lot, a WIN packet should
be sent back to the sender to decrease the window size, because evidently
the sender is sending information faster than the receiving user process
can absorb it.
Receivers should keep counts of how many packets were disposed of in each
of the above ways.
The sender has a list of packets which have been entrusted to it by the
user for transmission and one packet # variable, the number of the last
packet sent by the user. When the user next sends a packet, the sender
will increment this variable and set the packet# of the sent packet to the
result. The sender also sets the source and destination host numbers and
indices of the packet, sets the "ack packet #" to the last-received
variable of its corresponding receiver, sets the receiver's last-
acknowledged variable to that, and gives the packet to the transmission
medium for "immediate" transmission (perhaps it has to wait its turn in a
queue.) It also saves the packet on a list, in case retransmission
is required.
With each buffered packet the sender holds in trust, it remembers the time
that packet was last transmitted. From time to time "retransmission"
occurs. The sender gives one or more packets from its list to the
transmission medium. It always starts with the oldest, so as to keep
things in order, and sends the rest in order until it gets to one that was
transmitted too recently to do again. Retransmission is used to recover
from lost or damaged packets, lost or damaged acknowledgements, and packets
discarded by the receiver due to lack of buffering capacity.
Each time a receiver receives a packet, it gives the "ack packet #" from
that packet to its corresponding sender. The sender discards any packets
with numbers less than or equal to that, since their successful receipt has
just been acknowledged.
A transmitter keeps re-transmitting packets until they get acknowledged.
Since the acknowledge is from the ultimate receiver to the original sender,
packets will eventually get sent correctly even if some link somewhere in
the network occasionally loses packets. This end to end acknowledgement
also allows flow control to be done by ignoring packets for which the
receiver does not have buffer space, which is considerably simpler than
"allocation" schemes, and allows a higher rate of through-put [probably].
There are no negative acknowledgements in this scheme. If a packet is
garbled, how do you know who to send the negative acknowledgement to?
(But see the LOS packet.)
In order to make efficient use of the network, the number of useless
retransmissions has to be limited. But in order to provide reasonable
response even in the face of errors, the time-out before retransmission
should not be too long. To win, this time-out should be adjusted according
to circumstances. The first retransmission should happen fairly quickly,
but then the time-out should be increased so that a connection blocked by
some persistent error does not tie up any significant fraction of network
transmission capacity. Also note the similarity to the re-transmission at
the low level in the Ether network.
The interval from first transmission to retransmission, and the interval
between retransmissions, and the number of retransmissions before giving up
and declaring the destination host to be dead, are parameters yet to be
determined. The intervals might be dynamically altered. The maximum
number of packets the sender will accept from the user before making the
user wait is also a parameter, which is initially set by the user to
reflect the intended use of the connection. In fact, this is identically
the same as the window size. Perhaps the interval before retransmission
should be dynamically adjusted according to the observed mean time for
acknowledgements to come back.
In addition, since flow control is being done statistically, the rate of
transmission should be adjusted to approximate the capacity of the
receiver. If the receiver is a 10 character per second teletype, it does
no good to retransmit a 100-character packet every 50 milliseconds. On the
other hand, if the receiver is a file system computer usually able to
accept input faster than the network can send it (or than the transmitter
can generate it), maximum throughput can be achieved by having several
packets outstanding in the network simultaneously, and not waiting for the
acknowledge for one before sending the next.
Control over this will be provided in two ways. The first is explicit
control by users. When a user process opens a connection, it specifies the
window size for the receive side. The send side at the other host will get
informed of this window size. The NCP will use this as a guideline for how
many buffers to grant to the connection. (Most NCPs will probably allocate
and free buffers dynamically as packets arrive and depart, rather than
allocating the maximum required number of buffers as part of "OPEN".) Thus
telnets will have only one or two buffers, and file-transfers may want 30
or 40 buffers.
Note that the window size for a direction of a connection is set by the
receiver, but the sender can use a smaller one if he prefers.
The second form of control is done by the NCP. If an otherwise correctly
received packet must be discarded because there is no input buffer space,
the NCP may send a "WIN" packet back to decrease the window size and inform
the transmitting NCP that it is sending too fast and should slow down. (It
will slow down anyway, as it keeps retrying and losing, but an explicit
slow-down message may help make things more efficient.) The receiving NCP
should not send a slow-down every time it discards a packet, since it may
take the sending NCP a while to slow down (e.g. there may be several
packets outstanding in the network.) Similarly, if packets don't seem to be
coming in fast enough to keep the buffers full, a "WIN" packet which
increases the window size could be sent to encourage the sending NCP to
speed up. [The exact right way to use this needs to be worked out.]
[[Superly true.]]
To avoid problems with packets left over from old connections
causing problems with new connections, we do two things. First of
all, packets are not accepted as input unless the source and
destination hosts and indices correspond to a known, existent
connection. By itself, this should be adequate, provided that
retransmission is only done by the originating host, not be intervening
gateways and bridges in the network. This is because we can safely
assume that when a host agrees to open a connection with a certain
index number at its end, it will give up on any previous connection
with the same index, therefore it won't retransmit any old packets
with that index once it has sent out a new RFC or OPN. The indications
are that our network will be "local" enough that indeed retransmission
will only be done by the original host.
Problems could still occur if packets get out of order, so that
an OPN establishing a new connection gets ahead of a data packet
for an old connection with the same index. To protect against
this, it is necessary to assure that at least a few seconds
elapse before an index number is reused. This could be done
either by remembering when an index is last used, or by reserving
part of the 16-bit index number as a uniquization field, which
is incremented each time an otherwise-the-same index is reused.
Which method is chosen is at the discretion of each local NCP.
Another necessary assumption is that when a system crashes and
is reloaded (thus forgetting any remembered information about
which indices were in use when and so forth) that the time to
reload it is more than a few seconds.
Problems could occur not only with left over data packets,
but also with left over control packets. This isn't too much
of a problem since control packets are not retransmitted,
but it could still happen that a host gets faked out into
thinking that it has a connection to another host that the
other host doesn't know about. In this case, it should just
look like the connection was opened and then immediately closed,
since the other host won't generate any data packets and won't
accept any.
>>Media handlers
A host may be connected to more than one transmission medium. It has
service programs for each.
When a packet is received, if the opcode is RFC, it is handled specially.
The contact name is compared against those of all the indices which are
in the Listening state. If a match is found, that index is put into the
RFC-received state, its LSN packet is discarded, and the RFC packet is put
into its input list so that the server process can see it. If no server
is listening to that contact name, the RFC packet is placed on the
pending-RFC list, and a server process is created which will load itself
with a suitable program to open an index in "server" mode, gobble an
RFC packet, look at the contact name, and either reload itself with the
appropriate server program or send a CLS reply.
When a non-RFC packet is received, the system must look for a receiver
index to handle it. If none is found, or the state is wrong, or the
source host and index don't match, a LOS should be sent unless the
received packet was a LOS. Otherwise, if the received packet is
WIN, it is processed and discarded. Other packets are given to the
user; OPN and CLS cause a state change but are also given to the
user as input.
The transmitting side of a transmission medium handler has a queue
of packets to be transmitted. It should send them out, in order,
as fast as possible, except that if a receiving host has no buffer
space (which can be detected because its chaosnet interface will
cause "interference" on the ether), it should look down the list
for another host to send to. As long as packets to the same host
are sent in the order they are queued, everything will be all right.
In addition, when the packets are put into the transmit queue,
the destination host number has to be looked up in a table to
determine which transmission medium to use to get to it and
(in the case of ether) which physical host number to put in the
packet trailer for the hardware.
>>Buffers
In ITS, the buffering scheme will be as follows. There will be a pool of
32-word packet buffers available. When it runs out, more can be made. When
there are many free some can be flushed. 32-word buffers are made out of
the 128-word buffers that already exist. The reason this is done rather
than just making them out of 1024-word pages is to reduce system core usage
because often there will be only one or two pages worth of both 32-word and
128-word buffers, so they should share the same pages.
Each packet buffer has a two-word header, and 30 words which can hold a
packet. Packet buffers can be on one (or sometimes two) of six lists:
The free list. The receive list, of which there is one for each index.
The send list, of which there is one for each index. The transmission
list. The pending-RFC list. The pending-LSN list.
The free list contains packet buffers which are free. They are threaded
together through addresses in the first word. Zero ends the list.
The transmission list contains packets which are to be transmitted out
on the network "immediately". At interrupt level packets are pulled
off of this list and sent. (There might be more than one transmission
list if a machine is connected to more than one physical medium.)
The transmission list is threaded through addresses in the left half
of the first word. Zero ends the list. After transmission -1 is stored
to indicate that the packet is not on the transmission list any more.
If the right half of the first word is -1, indicating that the packet
is not also on a send list, it is returned to free.
Each send list contains packets for a particular connection which have
been entrusted to the system by the user to be sent, but have not yet
been acknowledged. They are threaded together through the right half
of the first word. The second word contains the time that the packet
was last transmitted (actually, the time that it was last put on the
transmission list.)
Each receive list contains packets which have been received on a particular
connection and not yet read by the user. They are threaded together
by addresses in the first word, and the list ends with zero.
The pending-RFC list contains request-for-connection packets which have
not yet been either accepted or rejected. It is threaded together
through the first word. When a server process finishes getting created
and loaded, it will take an RFC off the pending-RFC list and put it
on its own receive list. The second word of these packets contains
the time received so that the system can know when something has gone
wrong and they should be thrown away.
The pending-LSN list contains LSN packets for all the listening users.
These packets are just used as a handy place to save the contact name
being listened to. It is threaded together through the first word.
The source-index field in the packet header can, of course, be used
to find which user this packet belongs to.
>>ITS System Calls
(Other systems would have similar calls, with appropriate
changes for their own ways of doing things.)
OPEN
Not allowed. (I said this wasn't a "standard" device!)
Instead use:
CHAOSO
arg 1 - receive channel number
arg 2 - transmit channel number
First, the two specified channels are closed. Then an index
is assigned to the user and the two channels are set up to
point to it. Two channels are used since in general ITS
channels are unidirectional, and to allow to the user to
handle receive and transmit interrupts differently.
The created index is placed in the Closed state. To set up
a connection, IOT an RFC or LSN packet down the transmit
channel.
IOT
Always transfers exactly one packet. The effective
address of .IOT is the address of a 30.-word block
which contains the packet. .CALL IOT should be given
an immediate second argument which is the address of
the 30.-word block. .CALL SIOT is not allowed.
The format of the 30.-word block is:
16 16 4
-----------------------------------------
| opcd | nbytes | unused | 0 |
-----------------------------------------
|destination host |destination index| 0 |
-----------------------------------------
| source host | source index | 0 |
-----------------------------------------
| packet # | ack packet # | 0 |
-----------------------------------------
| data1 | data2 ...
... data104 |
-----------------------------------------
In the descriptions below, if an error is said to
occur that means IOC error 10. (channel in illegal mode)
is signalled.
In the case of an output IOT, the user sets only
the opcode, nbytes, and data-n fields. When the
NCP copies the packet into a buffer in the system
it sets the other fields of the header to the
appropriate values.
This is not completely true. When outputting an RFC,
the user sets the destination host field, and sets the
ack packet # to the receive window size desired. The user
also sets the window size when outputting an OPN.
The NCP checks for the following special values
in the opcode field of a packet output by the user:
RFC - error if the index is not in the Closed state.
The packet is transmitted (but not queued for
possible retransmission) and the index enters
the RFC-sent state. The user should do an input
IOT which will wait for the OPN or CLS reply
packet to arrive. The NCP also copies and saves
the user-specified host number and window size.
LSN - error if the index is not in the Closed state.
It is put into the Listen state. The packet
is not transmitted, but it is saved so that
when an RFC comes in the system can compare
the contact names. (Note- LSN is a special
opcode which is never actually transmitted
through the net.) The pending-RFC list is searched
to see if an RFC with the same contact name has
been received. If so, it is given to this index
as if it was received just after the LSN was
sent out.
OPN - error if the connection is not in the RFC-received
state. It is put into the Open state. The
packet is transmitted (but not queued for
retransmission, since until it is received
the other end does not know what index to
send acknowledgements to.) The system also
copies and remembers the window size.
CLS - error if the connection is not in the Open
or the RFC-received state. It is put into
the Closed state and the packet is transmitted
(but not queued for retransmission). This packet
may optionally contain data bytes which are
an ascii excuse for the close.
200 or higher - This is a data packet. Error if the
connection is not in the Open state. A packet#
is assigned, the destination and source fields
are filled in, and the packet is transmitted and
queued for retransmission.
Any other opcode causes an error.
In the case of an input IOT, the user will get an error
if the connection is in the Closed or Broken state,
except if it is in the Closed state and there are data
packets queued. This is so that the user can read the
CLS packet. Otherwise, it will hang until a packet
arrives, then return the packet into the user's
30.-word block.
The user should check the sign bit of the first word,
which will be set if this is a data packet. The
non-data packets which can get given to the user are
RFC, OPN, and CLS.
CLOSE
Immediately closes the connection. All buffers and other
information associated with the index are discarded. Normally
the user should first IOT a CLS
packet containing an ascii explanation for why it is
closing. Note that any data previously written on the
connection but not yet received by the other end will be
lost. User programs should exchange "bye" commands of some
sort before closing if they care about losing data. It is
done this way to keep the NCP simple.
RESET
Does nothing.
FORCE
Does nothing.
FLUSH
On an output channel, does FORCE and then waits until
there are no queued output buffers. I.e., waits for
all output to be received and acknowledged by the foreign
host.
RCHST
val 1 SIXBIT/CHAOS/
val 2 0
val 3 0
val 4 0
val 5 -1
RFNAME
val 1 SIXBIT/CHAOS/
val 2 0
val 3 0
val 4 0
val 5 0 or 1 ?
WHYINT
val 1 - %WYCHA
val 2 - state
val 3 - number of packets queued (receive,,transmit)
val 4 - window size (receive,,transmit)
LH(val 3) is the number of packets available to input IOT.
RH(val 3) is the number of packets which have been transmitted
by output IOT but which have not yet been received and
acknowledged by the foreign host.
The state codes are:
%CSCLS Closed
%CSLSN Listen
%CSRFC RFC-received
%CSRFS RFC-sent
%CSOPN Open
%CSLOS Broken by receipt of "LOS" packet.
%CSINC Broken by incomplete transmission (no acknowledge
for a long time)
NETBLK
Similar to Arpanet NETBLK.
STYNET
This should work the same as on the Arpanet. It will
not be implemented initially, however.
CHAOSQ
arg 1 - address of a 30.-word block (packet buffer)
This is a special system call for use by the ATSIGN CHAOS
program, which is a daemon program that gets run when
an RFC is received that does not match up against an
existing LSN.
The first packet on the pending-RFC queue is copied
into the packet buffer, then moved to the end of the
queue (so that the right thing happens when several
RFC's are pending at the same time.)
The call fails if the pending-RFC queue is empty.
The program should use the contact name in this
packet to choose a server program to execute. This
server program will then LSN to (presumably) the some
contact name, thus picking up the RFC.
Interrupts
IOC error interrupts occur if an attempt is made to IOT
when the connection is in an improper state, or to IOT
a packet with an illegal opcode.
An I/O channel interrupt is signalled on the input channel
when the number of queued buffers changes from zero to
non-zero.
An I/O channel interrupt is signalled on the output channel
when the number of queued buffers changes from greater or
equal to the window size, to less than the window size.
An I/O channel interrupt is signalled on the input channel
when the connection state changes.
Interrupts can be used for
(1) detecting when input arrives.
(2) detecting when the system is willing to accept
output.
(3) detecting when the other end does a CLOSE.
(4) detecting when a requested connection
is accepted or rejected.
(5) detecting when a request for connection
comes into a listening server.
>Comparison with LCSnet
, and other blathering.
>>Principle differences
The LCSnet proposed protocol is called DSP. The Chaosnet protocol will
just be called chaos in this section.
(1) DSP specifies things in terms of bytes where Chaosnet specifies
them in terms of packets. We choose packets to increase the simplicity
and efficiency of the scheme. DSP has to work in terms of bytes because
it allows packets to be reformatted en route, hence
(2) DSP assumes that gateways can exist between networks with the same
protocols but different packet sizes. Therefore, the protocol has to
allow for the fact that packets may be reformatted en route. I happen
to believe that this situation is extremely unlikely to exist, and in
fact gateways between "different" networks will have to do much more
than just change the packet size. Therefore, it makes sense to make
the gateway worry about gateway issues, rather than have them permeate
the whole protocol. I believe that gateways will look more like
regular network ports than like transmission media; to get from a host
on the local net to a host on the arpa net, one will connect to the
arpa net gateway and ask it to open a connection from itself to the
host on the arpa net, then tie those connections together. A gateway
will perform not only packet reformatting, but protocol translation,
flow control on both sides, and maybe even character set translation.
There can also be entities called "bridges", which connect two networks
(or two separate segments of one network) with the same protocol. A bridge
simply forwards any packets it receives, but never alters the packets,
and never looks inside them except to find out where to forward them to.
(3) A related difference is that DSP includes the arpa net, and TCP,
and by extension all the networks in the universe, in its port number
address space. Chaosnet would have you connect to a gateway, then
send the gateway the port number to connect to in the foreign
address space separately.
(4) Chaosnet has an "opcode" field in the packet header, where DSP
does not. DSP acheives the same effect with various bits here and
there. It makes little difference unless user-level programs decide
to exploit the opcode field.
(5) DSP and Chaosnet have quite different mechanisms for creating
connections. In DSP, one never creates a connection, exactly;
one simply starts sending to a port address. Local network note
#3 mumbles about how one discovers which port address to send to,
but I have not seen any specifics. In Chaosnet, the mechanism
for finding out where to send to and the mechanism for creating
a connection are intertwined; the excuse is that often the process
being connected to is created at the same time as the connection.
(6) DSP uses unique, never-reused port IDs. Chaosnet does not.
The problem with unique, never-reused IDs is that I know of no
system that can implement them. Multics comes close, with the
aid of a special hardware clock. The clock is set from the
operator's watch when the system is powered on, and the mechanism
depends on the fact that the error in the operator's watch is
less then the time required to bring up the system after a power
failure. Small systems that cannot afford special hardware just
for this, and don't have permanent disk storage, would find it
very hard to generate unique IDs.
Chaosnet prefers to rely on a scheme that doesn't require special
hardware, but nearly always works. By requiring a connection
to be opened before data packets can be sent through it, and by
some assumptions about the structure of the network, the problem
is eliminated. See the Flow and Error Control section for
further discussion.
(7) DSP closes the two directions of a connection separately. Why?
>>High priority data packets, interrupts, and flushing.
The basic idea is to note that if you want to send a high priority
message, this means you want it out of order with respect to previously-
sent data on some connection. Therefore, high priority data should
be sent over an auxiliary connection. The per-connection overhead
is not prohibitively high, and this eliminates vast quantities of
hair from the innermost portion of the system.
One advantage that DSP gains by having "high priority messages"
built into the system is that it also incorporates a standardized
way to "mark" a particular point in a data stream. However, this
is comparatively unimportant, particularly since I think high-priority
messages will probably never get used. The only place I've heard
them proposed to be used is with Telnet, but ordinary terminals
get along quite well without "out of band" signals when used with
reasonable operating system software.
Interrupts and flushing of input are random crocks associated
with high priority messages. I don't propose to implement them either.
>>Datagrams. (connections only used to pass a single packet.)
These are easy. The guy who wishes to send a datagram does
OPEN, IOTs an RFC to the service to which the gram is to be
sent, and NETBLKs waiting for the connection to open up. He
then IOTs the data packet, FLUSHes waiting for it to get there,
then CLOSEs.
The server OPENs and IOTs an OPN in response to the RFC. She
then IOTs in the datagram packet, CLOSEs, and goes off processing
the message.
Four packets are transmitted, two in each direction. (An RFC, an OPN,
a DATA, and an ACKing NOP.) No need to send any CLS messages, since
each user process knows to do a CLOSE system call after one data
packet has been transmitted. It has been claimed that this is
the theoretical minimum if acknowledgement is required. The reason
is that the data packet must contain some unique id generated by
the RECEIVER to avoid duplicates, and it must be acknowledged,
so that's two packets in each direction, with no combining possible.
>>Why not multiple messages per packet?
[1] Not needed for data. The division of the data stream into
packets is invisble to the real user, anyway. It's only used by
the "user-ring" portion of the network system software.
[2] Extra complexity. Consider the hair involved with packed
control messages in the Arpanet. Because of the control link being
shared between multiple connections between the same pair of hosts,
this could save a little. I don't know of any NCP that does this;
furthermore, having that shared facility is a bad idea. The only
case in the Arpanet where packing two control messages into one
packet is useful is when opening a connection the receiver wants
to send STR and ALL both. In this protocol we just put the window
size in as part of the RFC and OPN messages.
[3] There is an argument that having message boundaries separate
from packet boundaries is useful because gateways between different
networks may need to split up packets because the two networks
may have different maximum packet sizes. My feeling about this
is that the gateway is likely to have to do a good deal more than
that. It seems like too much to wish for that the two networks
should use exactly the same packet format, protocols, or even character
set; so the gateway rather than being a packet-reformatting device
is much more likely to look like a user program with two connections,
one on one network and one on the other, which passes data between
the connections with appropriate conversion. In particular, flow
control is likely to be host1 to gateway and host2 to gateway,
rather than host1 to host2.
>>Why not have a checksum in the packet?
This network is likely to have a very diverse collection of machines
on it, which means it will be impossible to define a checksum which
can be computed efficiently in software on all machines. Now all
hardware links in the system ought to have whatever amount of
hardware checking is appropriate to them, but due to the efficiency
costs of a user-level end to end checksum, this should not be
a built-in requirement of the basic low-level protocol. Instead,
checksumming should be an optional feature which some higher-level
protocols (those that need it because the data being passed through
them is so vitally important that every possible effort must be made
to ensure its correctness) may implement. Checksumming should
be implemented at the user level in exactly the same way and for exactly
the same reasons as encryption should be implemented at the user level.
>>How about host-independent user-level protocols, where one just
connects to a service and doesn't have to know what host it's at today?
Yeah, how about it? As far as I know, this protocol provides an
adequate base for constructing such a thing. Also I haven't
seen anything published on the subject.
>>Very small hosts.
E.g. we'd like to put the Chess machine on the net. It has very little
memory, but not totally impotent microcode. A small host need only
support one connection, may ignore WIN, LOS, and CLS, may only have one packet
in transmission at a time, and may process receive packets one at a time
(ignoring any that come in until the first one has been fully processed).
It IS necessary to check that received DATA packets come in in the right order.
RFC may be handled by remembering the other guy's host number and index,
and sending back a completely canned OPN. The contact name is ignored.
If a second user tries to connect while a first user is connected,
the first user gets bumped. Let them fight it out on some larger
machine (or the floor) for who will get to use the small machine.
Never originate any packet type other than DATA and that one OPN.
Attaching ordinary terminals "directly" to the network is obviously
undesirable.
>Transmission Media
This section describes how packets are encapsulated for transmission
through various media, and what auxiliary hair is needed by each
medium.
>>Ethernet
The messages transmitted through the ether (or CAIOS) net consist of
a packet followed by a three-word trailer:
+----------------+
| header | 8 words
+----------------+
| data | 0-52 words
+----------------+
| immediate dest | 1 word
+----------------+
| immediate src | 1 word
+----------------+
| CRC check word | 1 word
+----------------+
The three trailer words are looked at by the hardware; the last two
of them are supplied by the hardware. The reason this stuff is in
a trailer rather than a leader is that the caiosnet hardware actually
transmits the packet backwards. However, this is transparent to
the software.
Bytes are sent two per word. The low-order byte is first (pdp11 standard).
>>TEN11 Interface
[The following total hair has not been checked recently.]
Since interrupts can't be sent through the TEN11 interface, the pdp10 can
only service the network at relatively-infrequent intervals, for instance
every 1/60'th of a second. Therefore it is necessary to have queues of
packet buffers in each direction. This provides high speed by allowing
several packets to be processed at a time.
The speed and reliability of the TEN11 interface eliminates any need for
error checking. (ha ha) [ho ho] <he he> To decrease the load on the AI
pdp10, it is assumed that the pdp11's will be responsible for swapping the
bytes in the data portions of packets so that they will be in pdp10
standard order.
Even though the contents of packets will not be error-checked, the
pdp10 must check addresses to avoid being screwed by a losing pdp11.
The form of a packet encapsulated for the TEN11 interface will then be
|-----------------|-----------------|----|
| queue thread | 0=empty, 1=full | 0 |
|-----------------|-----------------|----|
| #bytes | opcode | unused | 0 |
|-----------------|-----------------|----|
|destination host |destination index| 0 |
|-----------------|-----------------|----|
| source host | source index | 0 |
|-----------------|-----------------|----|
| packet # | ack packet # | 0 |
|-----------------|-----------------|----|
| data 0 | data 1 | data 2 . . . | 0 |
| | 0 |
|-----------------|-----------------|----|
for a total of 31 36-bit words, or 62 pdp11 words.
The queue thread is the pdp11 address of the next packet-buffer in
a queue, or zero if this is the last. The empty/full indicator
says whether this buffer currently contains a packet, or not.
The following is an attempt to express the algorithms of the
pdp10 and pdp11 in concise form. Hopefully they are self-
explanatory.
Several queues of buffers need to exist in the pdp11. Only
two of these are known to the pdp10.
TO10QF - first buffer queued for transmission to the 10.
TO10QL - last buffer queued for transmission to the 10.
Exists so that buffers can be appended to the
list more quickly.
TO10AC - first buffer in list of buffers actively being
gobbled by the 10. Set by 11, cleared by 10.
TO10FR - copy of TO10AC. Used to free the buffers after
the 10 gobbles them.
(come-from pdp11-packet-receivers
when (eq (destination-host ?packet) pdp10)
;re-arrange 8-bit bytes for 10
(swap-bytes (data-part packet))
;Add to to-10 queue
(set (thread packet) nil) ;nil=0
(cond ((null TO10QF)
(setq TO10QF packet TO10QL packet))
(t (set (thread TO10QL) packet)
(setq TO10QL packet)))
;Try to activate to-10 queue
(cond ((null TO10FR)
(setq TO10FR TO10QF
TO10AC TO10QF
TO10QF nil
TO10QL nil)))
)
(come-from pdp11-polling-loop
when (and (not (null TO10FR)) ;buffers were sent to 10
(null TO10AC)) ;and 10 has finished gobbling
(mapcar 'make-buffer-free TO10FR) ;mapcar follows queue words
(setq TO10FR nil) ;OK to activate more buffers now
(cond ((not (null TO10QF)) ; more stuff waiting, activate it now
(setq TO10FR TO10QF
TO10AC TO10QF
TO10QF nil
TO10QL nil)))
)
(come-from pdp10-clock-level
when (and (not (null TO10AC)) ;11 is sending buffers
(not web-buffers-locked))
;copy to user, process control message, or reject if buffers full
(mapcar 'process-input
TO10AC)
;signal pdp11 that all packets have been gobbled
(setq TO10AC nil))
FRBFLS - list of free buffers. cons-buffer gets from here,
make-buffer-free puts here.
FM10AC - list of buffers into which pdp10 may place packets
set by 11 / cleared by 10.
FM10GB - copy of FM10AC, used by 11 to process buffers after
10 has placed packets into them.
(come-from pdp11-polling-loop
when (and (null FM10GB) ;10 needs some buffers &
(not (null FRBFLS))) ; free buffers available
;give the 10 a list of a suitable number of empty buffers
(repeat max-at-a-whack times
(and (null FRBFLS) (exitloop))
(setq buffer (cons-buffer)) ;pull off free list
(set (full-indicator buffer) nil) ;contains no packet
(set (thread buffer) FM10GB) ;cons onto list
(setq FM10GB buffer))
(setq FM10AC FM10GB) ;give buffer list to 10
)
(come-from pdp11-polling-loop
when (and (not (null FM10GB)) ;gave 10 some buffers
(null FM10AC)) ;which it has used
;process packets sent from 10.
(mapcar
'(lambda (buffer)
(cond ((null (full-indicator buffer))
(make-buffer-free buffer)) ;didn't get used
(t (swap-bytes buffer)
(send-to-destination buffer))))
FM10GB)
(setq FM10GB nil)) ;no buffers active in 10 now
(come-from pdp10-clock-level
when (and (not (null FM10AC)) ;buffer space avail
(not web-buffers-locked)) ;no M.P. interference
;send as many packets as possible
(mapcar
'(lambda (buffer)
(cond ((needs-to-be-sent ?packet) ;find a sendable packet somewhere
(copy-into buffer packet)
(set (full-indicator buffer) t))))
FM10AC)
;signal pdp11 to gobble the packets
(setq FM10AC nil))
To avoid the need for a gross number of exec pages in the pdp10,
the FM10AC and TO10AC words, and all the packet buffers, should
lie in a single 4K page. The best location for this page varies
from machine to machine. On dedicated 11's such as the AI TV11,
the MC console 11, etc. it should probably just be the first 4K
of memory. On the logo machine, it would probably be best to put
this page up in high memory where RUG won't mess with it. In the
case of the mini-robot system, I'm not sure.
It would be best not to try to use this protocol with "general
purpose" machines, because of problems with finding the list
headers and packet buffers, problems with telling whether the
machine is running the right program, etc. It should be used
just as a way for the AI machine to get a path to the net.
>>DL10 & DTE20
[Outline only]
[Just use the pdp11 as a substitute for a direct chaosnet interface.]
[Basically, the 11 says ready to transfer (in either direction), the 10
sets up the pointers and says to transfer, and the 11 transfers the cruft.
To eliminate an extra level of buffering, on input transfers the 11 makes a
copy of the first 4 16-bit words of the header available to the 10 when it first
says "ready to transfer." The 10 uses these to decide where to copy the
packet into. It helps if you don't try to use a DL10 on a machine with a
cache.]
>>Asynchronous line
Packets are encapsulated by preceding them with start of text (202),
and following them with a 1-byte additive checksum and an end of text (203).
The 16-bit words are transmitted low order byte first. If the checksum
is wrong the receiver ignores the packet. The start and end characters are
just there to help in ignoring random noise on the line. If they don't
appear, the packet is ignored. The full 60-word packet is not transmitted;
the #bytes count is used to determine how many of the data bytes to
transmit; the receiver fills the remainder with zero (or garbage, at its
convenience.)
This protocol is intended mainly for communication between the plasma
physics pdp11 in bldg. 38 and a pdp11 in 545, until the caiosnet
gets extended that far (or a longer-distance, lower-speed caiosnet
is extended to various machines off in that direction.)
>Higher-Level Protocols
>>Telnet
Essentially this should follow "New Supdup". [Unresolved issues with
respect to output reset remain.]
>>File-Access
[*****]
>>Mail
[*****]
>>Locate-named-service
[*****]